Skip to content

Userdata to display static NAT as public ip instead of VR ip#3862

Merged
andrijapanicsb merged 2 commits into
apache:masterfrom
ravening:display_static_nat
Mar 5, 2020
Merged

Userdata to display static NAT as public ip instead of VR ip#3862
andrijapanicsb merged 2 commits into
apache:masterfrom
ravening:display_static_nat

Conversation

@ravening
Copy link
Copy Markdown
Member

@ravening ravening commented Feb 4, 2020

Description

If static nat is enabled on VM then metadata service should return
the static nat instead of gateway IP.
If static not is not enabled then it should return the gateway IP
as the public IP

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Screenshots (if appropriate):

How Has This Been Tested?

Step to reproduce:

  1. Create a vm

  2. Ssh to vm.

  3. Run the below command inside the vm
    wget http://<VR public ip>/latest/meta-data/public-ipv4

    Note down the output of the above command

  4. Now acquire a new public and enable static NAT on that IP to this vm

  5. Now run the same command mentioned above in the VM
    This should display the static NAT ip instead of VR public IP

Output:

Before enabling static nat

wget http://10.10.10.40/latest/meta-data/public-ipv4
$ cat public-ipv4
10.10.10.29

After enabling static nat

wget http://10.10.10.40/latest/meta-data/public-ipv4
$ cat public-ipv4
10.11.10.30

@yadvr yadvr requested a review from DaanHoogland February 4, 2020 12:43
@yadvr
Copy link
Copy Markdown
Member

yadvr commented Feb 4, 2020

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result: ✖centos6 ✔centos7 ✔debian. JID-745

Copy link
Copy Markdown
Member

@weizhouapache weizhouapache left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tested manually

Copy link
Copy Markdown
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ravening this looks great, thanks. Can we rebase this on 4.13, please?

@yadvr yadvr added this to the 4.14.0.0 milestone Feb 5, 2020
@yadvr
Copy link
Copy Markdown
Member

yadvr commented Feb 5, 2020

if @ravening does decide to change to 4.13, we can change the milestone to 4.13.1 @DaanHoogland - it's not exactly a bug fix is it?

@DaanHoogland
Copy link
Copy Markdown
Contributor

It is marked as a bug and could be considered so by a user that has a static address assigned. The want to be visible by that address in the outside world, @rhtyd .

@yadvr
Copy link
Copy Markdown
Member

yadvr commented Feb 6, 2020

@ravening cc @weizhouapache - can you explain how the user-data would be used by the user VM? Could that cause regressions for VMs which were previously expecting the VR/gateway public IP instead?

@andrijapanicsb
Copy link
Copy Markdown
Contributor

andrijapanicsb commented Feb 7, 2020

When static nat is configured for a VM - then it actually shows it's IP address clearly, so this is in line with the fix I believe:
image

@andrijapanicsb andrijapanicsb changed the title Display static NAT as public ip instead of VR ip Userdata to display static NAT as public ip instead of VR ip Feb 7, 2020
@andrijapanicsb
Copy link
Copy Markdown
Contributor

@blueorangutan test

@blueorangutan
Copy link
Copy Markdown

@andrijapanicsb a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link
Copy Markdown

Trillian test result (tid-929)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 41812 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr3862-t929-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_volumes.py
Smoke tests completed. 78 look OK, 1 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_06_download_detached_volume Failure 138.87 test_volumes.py

@andrijapanicsb
Copy link
Copy Markdown
Contributor

@blueorangutan test

@blueorangutan
Copy link
Copy Markdown

@andrijapanicsb a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@yadvr
Copy link
Copy Markdown
Member

yadvr commented Feb 12, 2020

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result: ✖centos6 ✖centos7 ✔debian. JID-818

@ravening
Copy link
Copy Markdown
Member Author

@ravening cc @weizhouapache - can you explain how the user-data would be used by the user VM? Could that cause regressions for VMs which were previously expecting the VR/gateway public IP instead?

If VM's dont have static NAT enabled, it will still display VR gateway as public ip. Only when static NAT is enabled, it will display static NAT as public ip. I guess that should be the expected behavior and shouldnt break regression

@ravening ravening changed the base branch from master to 4.13 February 12, 2020 10:13
@ravening ravening changed the base branch from 4.13 to master February 12, 2020 10:15
@ravening ravening changed the base branch from master to 4.13 February 12, 2020 10:21
@ravening ravening changed the base branch from 4.13 to master February 12, 2020 10:22
@weizhouapache weizhouapache changed the base branch from master to 4.13 February 12, 2020 10:33
@andrijapanicsb
Copy link
Copy Markdown
Contributor

will test once I'm available for the sake of "somebody else tested as well" thx.

@yadvr
Copy link
Copy Markdown
Member

yadvr commented Feb 17, 2020

@blueorangutan test

@blueorangutan
Copy link
Copy Markdown

@rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link
Copy Markdown

Trillian test result (tid-1024)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 33752 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr3862-t1024-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_privategw_acl.py
Intermittent failure detected: /marvin/tests/smoke/test_ssvm.py
Intermittent failure detected: /marvin/tests/smoke/test_host_maintenance.py
Intermittent failure detected: /marvin/tests/smoke/test_hostha_kvm.py
Smoke tests completed. 77 look OK, 2 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_02_vpc_privategw_static_routes Failure 180.30 test_privategw_acl.py
test_03_vpc_privategw_restart_vpc_cleanup Failure 177.38 test_privategw_acl.py
test_04_rvpc_privategw_static_routes Failure 240.66 test_privategw_acl.py
test_hostha_kvm_host_degraded Error 672.07 test_hostha_kvm.py
test_hostha_kvm_host_fencing Error 665.49 test_hostha_kvm.py

@yadvr
Copy link
Copy Markdown
Member

yadvr commented Feb 19, 2020

Can anyone confirm manual testing? Are we breaking any compatibility?

@DaanHoogland
Copy link
Copy Markdown
Contributor

@wido @GabrielBrascher Are using this kind of configuration? @pdion891 @swill ?

@rhtyd I don't see any related error in that report from the blue ape. I am fine with merging.

/cc @ravening @weizhouapache

@andrijapanicsb
Copy link
Copy Markdown
Contributor

testing it now

@andrijapanicsb
Copy link
Copy Markdown
Contributor

Tested fine, except for a single issue I noticed (might not be related, but it also might)
Happens in both VPC and Isolated network

  • Configure Static NAT, check meta-data - correct Static NAT IP reported
  • Release the IP, without previously disabling the Static NAT - and still the same Public IP is reported in meta-data, which is wrong (should be the source NAT now)
  • FYI: Disabling the Static NAT works fine, again Source NAT is reported in meta-data

The issue has something to do with IPtables and the way we (don't) clean those rules properly, see the image below.

@weizhouapache if you think this is not related to the PR, then we can move forward with this one (and this is kind of edge case, you are usually supposed to disable Static NAT, but again - a valid issue anyway)

image

@weizhouapache
Copy link
Copy Markdown
Member

Tested fine, except for a single issue I noticed (might not be related, but it also might)
Happens in both VPC and Isolated network

  • Configure Static NAT, check meta-data - correct Static NAT IP reported
  • Release the IP, without previously disabling the Static NAT - and still the same Public IP is reported in meta-data, which is wrong (should be the source NAT now)
  • FYI: Disabling the Static NAT works fine, again Source NAT is reported in meta-data

The issue has something to do with IPtables and the way we (don't) clean those rules properly, see the image below.

@weizhouapache if you think this is not related to the PR, then we can move forward with this one (and this is kind of edge case, you are usually supposed to disable Static NAT, but again - a valid issue anyway)

image

@andrijapanicsb thanks for testing !
It is definitely an issue. We will fix it.

Comment thread server/src/main/java/com/cloud/network/router/CommandSetupHelper.java Outdated
If static nat is enabled on VM then metadata service should return
the static nat instead of gateway IP.
If static not is not enabled then it should return the gateway IP
as the public IP

Test results:

Step to reproduce:

1. Create a vm
2. Ssh to vm.
3. Run the below command inside the vm
wget http://<VR public ip>/latest/meta-data/public-ipv4

Note down the output of the above command
4. Now acquire a new public and enable static NAT on that IP to this vm
5. Now run the same command mentioned above in the VM
This should display the static NAT ip instead of VR public IP

Output:

Before enabling static nat

wget http://10.10.10.40/latest/meta-data/public-ipv4
$ cat public-ipv4
10.10.10.29

After enabling static nat

wget http://10.10.10.40/latest/meta-data/public-ipv4
$ cat public-ipv4
10.11.10.30
@ravening
Copy link
Copy Markdown
Member Author

Tested fine, except for a single issue I noticed (might not be related, but it also might)
Happens in both VPC and Isolated network

  • Configure Static NAT, check meta-data - correct Static NAT IP reported
  • Release the IP, without previously disabling the Static NAT - and still the same Public IP is reported in meta-data, which is wrong (should be the source NAT now)
  • FYI: Disabling the Static NAT works fine, again Source NAT is reported in meta-data

The issue has something to do with IPtables and the way we (don't) clean those rules properly, see the image below.

@weizhouapache if you think this is not related to the PR, then we can move forward with this one (and this is kind of edge case, you are usually supposed to disable Static NAT, but again - a valid issue anyway)

image

@andrijapanicsb Thanks for testing. I have updated the commit. Can you please test it again and let me know if you face any issues?

@weizhouapache
Copy link
Copy Markdown
Member

@ravening the new commits is not related to the issue @andrijapanicsb reported.
can you test following what @andrijapanicsb said ?

Release the IP, without previously disabling the Static NAT - and still the same Public IP is reported in meta-data, which is wrong (should be the source NAT now)

@ravening
Copy link
Copy Markdown
Member Author

Release the IP, without previously disabling the Static NAT - and still the same Public IP is reported in meta-data, which is wrong (should be the source NAT now)

@weizhouapache tested the steps which @andrijapanicsb mentioned above.
Yes it does report the public static nat ip instead of source nat ip if I release an IP without disabling the static nat.

@yadvr
Copy link
Copy Markdown
Member

yadvr commented Feb 28, 2020

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result: ✖centos6 ✔centos7 ✔debian. JID-971

@yadvr
Copy link
Copy Markdown
Member

yadvr commented Feb 28, 2020

@blueorangutan test

@blueorangutan
Copy link
Copy Markdown

@rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link
Copy Markdown

Trillian test result (tid-1149)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 28552 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr3862-t1149-kvm-centos7.zip
Smoke tests completed. 80 look OK, 0 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File

@DaanHoogland
Copy link
Copy Markdown
Contributor

@andrijapanicsb can you re-test?

Copy link
Copy Markdown
Member

@yadvr yadvr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, last smoketest LGTM - kindly test and then merge @andrijapanicsb

@yadvr
Copy link
Copy Markdown
Member

yadvr commented Mar 5, 2020

@andrijapanicsb can you re-test?

@andrijapanicsb
Copy link
Copy Markdown
Contributor

incoming...

@andrijapanicsb andrijapanicsb self-requested a review March 5, 2020 11:44
Copy link
Copy Markdown
Contributor

@andrijapanicsb andrijapanicsb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM based on manual testing (both VPC and Isolated network) - last issues were fixed and all looks good (also validated real remote connections as seen by the outer world - not just the metadata)

Also tested (on Isolated only) with IPs from an additional Public IP range (different VLAN), and all good.

@andrijapanicsb
Copy link
Copy Markdown
Contributor

Merging based on 3 x LGTMs/Approvals and extensive manual testing, and passing regression tests.

@andrijapanicsb andrijapanicsb merged commit 9c6b02f into apache:master Mar 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants